Inner TRIM3 Masthead

Statistical Match Procedure Used in the 2005-2006 Baselines

The statistical match procedure used in the 2005-2006 baselines is an unconstrained nearest neighbor match very similar to that used in 2003-2004, but including additional modifications to restore variation in top-coded incomes. Prior to matching, the CPS and PUF are divided into mutually exclusive groups that only allow matching within each respective group. The groups are defined by the following "blocking variables":

  • filing status - whether the taxpayer files a single, joint, or head of household return
  • aged status - whether the taxpayer (or spouse in the case of a joint return) is age 65 or over
  • dependents - the number of dependents claimed by the taxpayer (none, one, or two or more)
  • dependency status - whether the taxpayer can be claimed as a dependent on another return
Once the blocking variables have been used to identify the set of PUF records that can be matched to a given CPS tax unit, an PUF record is selected using a "minimum distance" function that takes into account the difference between the CPS tax unit and the PUF record for each of ten income items reported on both the CPS and the PUF (wages and salary, interest income, dividend income, pensions and retirement income, total social security benefits, unemployment compensation, rental income or loss, self-employed income or loss, farm income or loss, and alimony received). Once selected, variables from that record are assigned to the CPS tax unit. The weight of the PUF record is then reduced by the weight of the CPS tax unit. Once the weight for an PUF record has been reduced to zero, it cannot be matched to additional CPS tax units.

Several additional constraints are imposed on the matching algorithm that have the effect of reducing the number of PUF records that are potential matches to a particular TRIM3 record. These constraints relate to:

  • Capital Gains and Transfer Program Recipients. The statistical match does not assign capital gains to tax units receiving SSI, TANF, public or subsidized housing, or food stamp benefits.
  • Home Ownership. A TRIM3 tax unit must own a house in order to be matched with a PUF tax unit that claims itemized deductions for home mortgage interest expenses or real estate taxes.
  • State and Local Tax Deductions. A TRIM3 tax unit in a state without a state income tax can only be matched to a PUF record claiming the state and local income tax deduction if the PUF tax unit is also in a state without a state income tax.
  • Adjustments for Keogh/SEP Contributions. A TRIM3 tax unit must have business or farm self-employment income in order to be matched with an PUF tax unit that claims adjustments to income for contributions to Keogh or SEP retirement accounts.
  • Child and Dependent Care Expenses. A TRIM3 tax unit must have qualifying child care expenses to be matched to a PUF tax unit that claims child and dependent care expenses.
  • PUF Variables Exceeding Prescribed Levels. A single large value for an PUF variable can produce skewed results if the PUF record in question represents only a few tax units but is matched to a CPS tax record representing many tax units. To avoid this problem, the match procedure disallows matches to PUF records with very large values for certain variables.

The 2005 baseline introduced a practice of using the PUF to restore variation to top-coded CPS incomes. During this era of the CPS, the Census Bureau top-coded income amounts exceeding certain thresholds in order to preserve confidentiality, and replaced top-coded amounts with averages calculated for all top-coded individuals. The replacement values used for earned income variables varied by gender, race/ethnicity, and whether the person worked full-time for the full-year. The goal of the 2005 improvements was to increase variation in income amounts over the threshold, allowing for more precise calculation of taxes.

The statistical match restores variation to the following top-coded CPS income variables: wages, business income, farm income, interest, pensions, a combined variable representing dividends, estates, and trusts, and a combined variable representing rents and royalties. If a tax unit has top-coded income from one or more of these sources, then two additional modifications are made to the matching algorithm:

  • We remove the top-coded income variable(s) from the distance function.
  • We impose the additional constraint that only PUF records with values in excess of the censoring point (the point at which CPS top-coding begins) are eligible to be matched to the TRIM3 tax unit.
Once a PUF record has been matched to a top-coded tax unit, we replace the CPS income amount (for any top-coded variables) with the amount obtained from the PUF record. The modified income variables do not overwrite the CPS income variables stored in the TRIM3 database, but are stored as a set of alternative income variables for use as input to the Federal Tax simulation. In particular, the baselines labeled "highinc" use the PUF income values, while the regular baselines use the CPS income values.

Because all of the variables obtained through the statistical match for an individual tax unit are obtained from a single PUF record, we are limited in our ability to align any specific variable to target. We perform some minimal alignment by adjusting the dollar amounts used to disallow matches to PUF records with large income or deduction amounts.

The 2005 and 2006 baselines used the 2004 version of the IRS PUF.